Hypothesis-based feature combination of multiple speech inputs for robust speech recognition in automotive environments
نویسندگان
چکیده
In a microphone array system, feature combination in the MFCC domain can improve speech recognition accuracy. Multiple microphones provide different feature parameters such as MFCCs even if they have similar speech and noise signals, because of the phase difference and transmission characteristics. In this paper, we investigate how the recognition performance changes when we average multiple MFCC feature vectors. In addition, we extend Hypothesis-Based Feature Combination, which we formerly proposed for dual-microphone systems, to multi-input systems. Experimental results show that variance re-scaling is necessary when we combine multiple inputs with Cepstral Mean Normalization (CMN), in both MFCC average and HBFC. However, we can obtain better results without variance re-scaling if we use Mean and Variance Normalization (MVN) with MFCC average or HBFC. In the experiments using the database collected in a real automotive environment, HBFC-MVN reduced 22% of the recognition errors from the baseline single-microphone system.
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کامل